Using Multiple Edit Distances to Automatically Rank Machine Translation Output

نویسندگان

  • Yasuhiro Akiba
  • Kenji Imamura
  • Eiichiro Sumita
چکیده

This paper addresses the challenging problem of automatically evaluating output from machine translation (MT) systems in order to support the developers of these systems. Conventional approaches to the problem include methods that automatically assign a rank such as A, B, C, or D to MT output according to a single edit distance between this output and a correct translation example. The single edit distance can be differently designed, but changing its design makes assigning a certain rank more accurate, but another rank less accurate. This inhibits improving accuracy of rank assignment. To overcome this obstacle, this paper proposes an automatic ranking method that, by using multiple edit distances, encodes machine-translated sentences with a rank assigned by humans into multi-dimensional vectors from which a classifier of ranks is learned in the form of a decision tree (DT). The proposed method assigns a rank to MT output through the learned DT. The proposed method is evaluated using transcribed texts of real conversations in the travel arrangement domain. Experimental results show that the proposed method is more accurate than the single-edit-distance-based ranking methods, in both closed and open tests. Moreover, the proposed method could estimate MT quality within 3% error in some cases.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Named Entity Transliteration Generation Leveraging Statistical Machine Translation Technology

Automatically identifying that different orthographic variants of names are referring to the same name is a significant challenge for processing natural language processing since they typically constitute the bulk of the out-of-vocabulary tokens. The problem is exacerbated when the name is foreign. In this paper we address the problem of generating valid orthographic variants for proper names, ...

متن کامل

Evaluating machine translation output with automatic sentence segmentation

This paper presents a novel automatic sentence segmentation method for evaluating machine translation output with possibly erroneous sentence boundaries. The algorithm can process translation hypotheses with segment boundaries which do not correspond to the reference segment boundaries, or a completely unsegmented text stream. Thus, the method is especially useful for evaluating translations of...

متن کامل

Evaluating Human Correction Quality for Machine Translation from Crowdsourcing

Machine translation (MT) technology is becoming more and more pervasive, yet the quality of MT output is still not ideal. Thus, human corrections are used to edit the output for further studies. However, how to judge the human correction might be tricky when the annotators are not experts. We present a novel way that uses cross-validation to automatically judge the human corrections where each ...

متن کامل

DTED: Evaluation of Machine Translation Structure Using Dependency Parsing and Tree Edit Distance

We present DTED, a submission to the WMT 2016 Metrics Task using structural information generated by dependency parsing and evaluated using tree edit distances. In this paper we apply this system to translations produced during WMT 2015, and compare our scores with human rankings from that year. We find moderate correlations, despite the human judgements being based on all aspects of the senten...

متن کامل

What can we learn about the selection mechanism for post-editing?

Post-editing is an increasingly common form of human-machine cooperation for translation. One possible support for the post-editing task is offering several machine outputs to a human translator from which then can choose the most suitable one. This paper investigates the selection process for such method to get a better insight into it so that it can be optimally automatised in future work. Ex...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001